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Star clusters are often hard to find, as they may lie in a dense field of background objects or, because in the case of embed- 
ded clusters, they are surrounded by a more dispersed population of young stars. This paper discusses four algorithms that 
have been developed to identify clusters as stellar density enhancements in a field, namely stellar density maps from star 
counts, the neareast neighbour method and the Voronoi tessellation, and the separation of minimum spanning trees. These 
methods are tested and compared to each other by applying them to artificial clusters of different sizes and morphologies. 
While distinct centrally concentrated clusters are detected by all methods, clusters with low overdensity or highly hierar- 
chical structure are only reliably detected by methods with inherent smoothing (star counts and nearest neighbour method). 
Furthermore, the algorithms differ strongly in computation time and additional parameters they provide. Therefore, the 
method to choose primarily depends on the size and character of the investigated area and the purpose of the study. 
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1 Introduction 

Most stars are born in clusters and even though a large frac- 
tion of them dissolve at an early stage, star clusters remain 
important building blocks of galaxies, holding crucial clues 
to star formation, stellar evolution and galactic dynamics. 
While the most prominent clusters have been found by eye 
(e.g. Messier [1774), today more sophisticated methods are 
needed. 

Star clusters are usually not found in isolation, but rather 
surrounded by a distributed stellar population or unrelated 
background objects. Molecular clouds, the places where stars| 
are born, contain embedded clusters as well as a distributed 
population of young stellar objects (YSOs). In Galactic mo- 
lecular cloud complexes only roughly 50 per cent of the 
YSOs are found in large clusters, the rest is found in smaller 
groups (n < 10) or in relative isolation (e.g. Hatchell et 
al. 125531 Schmeja et al. 125581 Roman-Zuniga et al. [2008b . 
Open clusters, which (unlike globular clusters) usually do 
not show a strong radial density gradient, often do not stand 
out prominently from the field of unrelated background stars.| 
Therefore, methods to detect and delineate clusters are need-1 
ed. Especially for the statistical analysis and comparison of 
large samples of star clusters it is important to identify all 
clusters in a homogeneous way, and the application of au- 
tomated cluster searches in large-scale surveys requires ef- 
ficient algorithms. 

Finding connected objects or clustering is a well-known 
problem in pattern recognition and classification. A gen- 
eral review and evaluation of statistical cluster finding algo- 
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rithms is given e.g. in Hartigan ([T975 1985). In this work, 
we will focus on the specific problem of stellar clusters. 
Since for this purpose clusters are defined as having a den- 
sity higher than the surrounding field, the methods rely on 
determining the stellar surface density and consider as clus- 
ters all regions above a certain deviation from the back- 
ground level. Since open clusters are gravitationally bound 
structures consisting of roughly coeval stars, detecting den- 
sity enhancements is only the first step to identify potential 
clusters. Stellar density enhancements can also be caused 
by chance alignments or holes in foreground extinction (e.g. 
Odenkirchen & Soubiran |2552l Froebrich et al. 125571 125581 
Maciejewski & Niedzielski [20081 Moni Bidin et al. [20T0T >. 
Therefore, to verify whether stars are really physically re- 
lated in an open cluster, additional criteria, such as radial 
density profiles (e.g. Gaussian or King), colours or kine- 
matics, are needed (e.g. Platais 1200 It Kharchenko et al. 
|20041 >. As there is a smooth transition from embedded clus- 
ters to the more dispersed YSO population in a molecular 
cloud (e.g. Elmegreen 120101 Bressert et al. , 2010 ), any de- 
limitation of the boundaries of embedded clusters will be 
somewhat arbitrary. 

Many methods to identify star clusters in a field have 
been derived and successfully applied. However, a thorough 
evaluation and comparison of these methods has never been 
done. Here we discuss the most important algorithms and 
compare them with each other by applying them to artifi- 
cially created clusters. The investigated algorithms are de- 
scribed in Section [2] and the test cases of artificial clusters 
in Section[3] Section[4]describes how the algorithms are ap- 
plied to the model clusters, while in Sections [5] and [6] the 
results of the comparison are presented and discussed. 
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2 Cluster finding algorithms 

The algorithms are described and tested for projected, two- 
dimensional clusters, but all of them can be applied to three- 
dimensional distributions (like the results of simulations or 
future 3D observational data) as well. 

2.1 Star counts 

An obvious and straightforward approach is finding varia- 
tions in the stellar density by simple star counts. This re- 
quires dividing the investigated region into smaller bins of 
equal size and determining the number of stars in each bin. 
Bins with counts greater than some significance threshold 
(~ 2 — 5er) above the mean value can be considered as the 
locations of potential clusters. The binning size has to be 
chosen carefully such that the number of objects per bin is 
neither too small (prohibiting a meaningful analysis) nor too 
large (hiding existing features). Usually the region surveyed 
is subdivided into a rectilinear grid of overlapping squares 
that are separated by half the side length of an individual 
square (the Nyquist spatial sampling interval) (Lada & Lada 
[19931 Carpenter et al. [19951120501 Kumar et al. [20Q4ll2006T >. 
The method can be refined by using different bin sizes in 
order to investigate large-scale structures as well as smaller- 
scale subclustering (Kumar et al. 2004 2006; Kirsanova et 
al. |20081 l or by smoothing the binned data over adjacent bins 
(Lada et al. fTWTl Karampelas et al. [2Q09l >. 

As it only requires the mapping of the stellar surface 
density, the star count method is easy to implement and ver- 
satile, at the cost of a few shortcomings. Once large datasets 
with strongly varying stellar densities and cluster sizes are 
considered, the a priori choice of an adequate bin size be- 
comes difficult. 



2.2 Nearest neighbour density 

The nearest neighbour (NN) method is a simple and popu- 
lar method for statistical pattern recognition (e.g. Cover & 
Hart 1967), in classification usually more accurately called 
the fc-nearest neighbours method. It has been widely used in 
many fields of science, in particular in ecology (e.g. Thomp- 
son [T956J Franco-Lopez et al . 120011 Makela & Pekkarinen 
120041) . The method was introduced in astronomy by Caser- 
tano & Hut ( 1 19851 ) based on earlier work by von Hoerner 
(119631 ). While the method has been frequently applied to 
star clusters using the first nearest neighbour (e.g. Gomez et 
al. |19931 ), the more advanced approach described below has 
been applied to star clusters only recently (Gutermuth et al. 
125531 12008all2008bl Roma n-Zufiiga et al. [20081 J0r gensen 
et al. 125581 Schm eja et al. l2008l[2009l Wang et al. [20091 
Kirk et al. [20091 Ferreira [20101 Gouliermis et al. |20T0l ). 
A related algorithm has been described by Gladwin et al. 
([19991 . 

The NN method estimates the local source density pj by 
measuring the distance from each object to its jth nearest 



neighbour: 
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(Casertano & Hut 1985 ), where rj is the distance of a star 
to its jth nearest neighbour, S(rj) the surface area with the 
radius rj and m the average mass of the sources (m = 1 
when considering number densities). 

The NN method is non-parametric, unlike star count 
methods it does not require the choice of a bin size and only 
depends on the choice of j. Due to statistical fluctuations, 
even randomly distributed points will show some degree of 
clustering, producing small clusters of a few objects. The 
higher the number of members, the higher is the likelihood 
that the clustering is physically significant. Casertano & Hut 
( 1 1985b show that low j values, in particular j = 1 or 2, 
are extremely sensitive to statistical fluctuations, therefore 
they suggest using a value of j > 6. On the other hand, the 
choice of a too large j value results in a loss of sensitivity 
to real density variations on smaller scales. Ferreira ( 120101) 
and Ferreira & Lada (in preparation) show that a value of 
j = 20 is best suited to detect clusters with n > 20 mem- 
bers. For detecting substructure within a cluster a lower j 
value is preferable, while higher j values may be used to 
trace large-scale structures. 

The NN method also allows the determination of ad- 
ditional structural parameters. The positions of the cluster 
centres are defined as the density-weighted enhancement 
centres (Casertano & Hut |1985l ) 



x d . 
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where x t is the position vector of the ith cluster member 
and pj the jth NN density around this object. 

Similarly, the density radius is defined as the density- 
weighted average of the distance of each star from the den- 
sity centre: 

E; N -x d , 3 \p 7 
r d,j = — 
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(von Hoerner 1963; Casertano & Hut; 1985). It corresponds 
to the observational core radius (Casertano & Hut |1985l ). 

The NN algorithm is easy to implement by computing 
and sorting the distances from any point to every other point, 
however, this "naive" approach scales with (n — l) 2 and 
is therefore computationally expensive for large n. More 
sophisticated algorithms have been developed to overcome 
this by seeking to reduce the number of distance determina- 
tions required (e.g. Lee & Wong [19771 Aghbari[2555]). 

Clusters are considered as regions with densities above 
a certain threshold (e.g. 3er above the background density). 
Another approach is to use the distribution of the NN dis- 
tances, which shows a large peak for the background sources^ 
and another (usually smaller) one at shorter distances for the 
cluster stars. Ferreira ( 120101 ) and Ferreira & Lada (in prepa- 
ration) suggest 



^cutoff — Afield — 1.5 • er(dfi 
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al. EOOTl Kim et al. l2002l Panko & Flin l2004l van Breuke- 
len et al. 120061 ) and for finding overdensities in X-ray photon 
counts (Ebeling & Wiedenmann l 19931) . the only application 
to a star cluster known to the author was done by Espinoza 
et al. ( 120091 . 

2.4 Minimum spanning tree separation 

Minimum spanning trees (MSTs) are a construct of graph 
theory, related to the well-known travelling salesman prob- 
lem. The first description is given by Borfivka (119261) . algo- 
rithms have also been developed independently by Kruskal 
( [1956] ). Prim ( fl957] l. and Loberman & Weinberger JT9571 
Details on the historical evolution of MST algorithms can 
be found in Graham & Hell ( 1985). Accelerated algorithms 
are presented e.g. by Bentley & Friedman (1978) and Rohlf 
( 119781 ). MSTs have been associated for the fist time to clus- 
ter analysis presumably by Gower & Ross ( 119691 ). Astro- 
physical applications have been discussed mainly with re- 
spect to the large-scale distribution of galaxies (e.g. Bar- 
row et al. [T9851 Bhavsar & Ling |1988airi988bl Krzewina 
& Saslaw [19961 Adami & Mazure [19991 Doroshkevich et 
al. 120041 ). Meanwhile MSTs have also been used for the 
identification of star clusters (Grebel et al. 119991 Bastian 
et al. 120071120091 Koenig et al. 120081 Gutermuth et al. |2009l 
Maschberger et al. l20T0l Beerer et al. l20T0t . 

The MST is the unique set of straight lines ("edges") 
connecting a given set of points ("vortices") without closed 
loops, such that the sum of the edge lengths is minimum 
Another non-parametric method to determine the local source| (Fig. Eb). The mean edge length I of the MST can be used 




Fig. 1 The Voronoi diagram of a set of points. 



as the optimal cutoff value, where <ifi c id is the peak of the 
distribution of )th NN distances of the field and cr(dfi c id) 
the standard deviation of the distribution of these distances. 



2.3 Voronoi tessellation 



density is based on the Voronoi tessellation. The Voronoi 
tessellation (Lejeune Dirichlet l 18501 Voronoi' [l908l see also 
Aurenhammer & Klein 2000) is the partitioning of a plane 
with n points into n convex polygons such that each poly- 
gon contains exactly one point and every point in a given 
polygon is closer to its generating point than to any other 



to quantify the cluster structure (Cartwright & Whitworth 
120041 120091 Schmeja & Klessen[2006lt, the total edge length 
can be used to determine the degree of mass segregation in 
a cluster (Allison et al. 120091 ). 

The MST is a subgraph of the Delaunay triangulation 
(Shamos & Hoey 119751 Toussaint 119801 ). and in that way 



(see Fig.IB- It is related to the Delaunay triangulation, which| connected to the Voronoi tessellation discussed above. 



is its dual graph. The higher the density in a certain region, 
the smaller are the areas of the individual polygons. The 
local source density around a point can be defined as the 
reciprocal of the area of the Voronoi polygon of this point. 
Care has to be taken at the borders of the point set, as the 
areas can become extremely large there. Overdensities, and 
therefore potential clusters, can be found in the same way 
as in the star count or NN method by applying a density 
threshold above the mean background density. 

As the Voronoi tessellation method is very sensitive to 



In the case of star clusters, the vortices correspond to the 
positions of the stars or YSOs and the edge lengths I to the 
Euclidean distance between two connected objects. 

An additional reducing operation, called separating, can 
be used to isolate clusters (Zahn fTWTl Barrow et al. 119851 
Schmeja & Klessen [20061 . Separating means removing all 
edges of the MST whose lengths exceed a certain limit l c 
(Fig. |2};). This procedure is also called partitioning, cut- 
ting, clipping, splitting or fracturing. When removing edges 
from a MST, each remaining subgraph is again a MST of its 



small-scale fluctuations, the density estimates can be smoothed^ortices Having higher densities and therefore shorter edge 
with those of adjacent cells to obtain a mo re reli able mea- lengths, the clusters remain connected in a subtree, while 
sure of th e local density (e.g. Neyrinck et al.|20051 Gonzalez be i ng disconnected from the rest of the graph. This proce- 
& Padilla|2009J). Another way to interpolate the density esti- dure wiU also leave a lot of su btrees consisting of a small 
rnates is the penalised centroidal Voronoi tessellation (Browne|n Um ber of edges, due to statistical density fluctuations or 
l 2 007D rearranging the input points in order to generate a reg- binary/multiple systems. Therefore a minimum number of 
ularised estimate. cluster members n has to be used as an additional criterion. 

Cluster finding algorithms based on Voronoi tessella- A cluster is then defined as a subtree consisting of n — 1 
tions have been applied to galaxy clusters (e.g. Ramella et edges with I < l c . 
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Fig. 2 (a) A set of points (the same as in Fig. [TJ, (b) the MST of this point set, (c) the separated MST: all edges with 
lengths I > £ have been removed. 



Methods similar to the MST separation work by build- 
ing up subtrees with edges smaller than a given l c rather 
than constructing the MST and separating it. They include 
friends-of-friends algorithms (e.g. Feitzinger & Braunsfurth 
[T9841 Wilson [199T1 Einasto et al. [19941 the path linkage 
criterion (PLC; B attinelli 1 1 99 1 1 > and the "constellation graph'] 
(Ueda & Itoh fT997l Ueda et al. 120091 . 

Compared to classical data clustering, where every data 
point is assigned to a cluster and the number of desired clus- 
ters is usually given a priori, finding an adequate value for 
the cutoff length l c is more difficult for star clusters. Several 
methods to determine l c have been suggested. A straightfor- 
ward way is to use a multiple of the mean edge length (Zahn 
Wl\\ Barrow et al. [19851 Bhavsar & Ling |1988bl Plionis et 
al. [19921 Pearson & Coles [T9951 Harari et al. 120061 or its 
standard deviation (Zahn [1971] Zucca et al. |199U Schmeja 
& Klessen[2Q06|. Campana et al. (!2()()8j) argue that a value 
of l c « I is best suited to isolate clusters. Koenig et al. 
(120081 ) plot all edge lengths sorted by length. This distri- 
bution shows a pronounced kink toward long edge lengths. 
Straight lines can then be fitted through the long- and short- 
length portions of the distribution. The crossing of these 
lines defines the cutoff length. A similar approach is used 
by Gutermuth et al. (120091 and Beerer et al. d20T0T >. Gra- 
ham et al. (119951 1. Tesch & Engels (120001 , and Bastian et al. 
(120071 1. following Battinelli (119911 1. apply different values 
of l c and plot the number of identified clusters as a function 
of l c . The peak of this function is then chosen as l c , i.e. the 
cutoff length that produces the maximum number of clus- 
ters. Maschberger et al. (120101 1 also apply different values 
of l c and choose it such that the subclusters found by the 
algorithm "have properties similar to subclusters which are 
selected by eye". 

To evaluate whether a detected structure is a true clus- 
ter or not, Campana et al. (120081 1 and Massaro et al. d20091 > 
introduce additional parameters. The clustering parameter 
g is defined as the ratio between the mean edge length of 
the entire MST and the mean edge length of a subtree: g = 



^MST/^subtrec- The higher its value of g, the more likely is 
a candidate cluster a true one. The magnitude = n^gk 
combines the clustering parameter g with the number of vor- 
tices in a particular subtree k. A high value of M is expected 
to point to a real cluster. 

3 The model clusters 

In order to test the algorithms, different sets of clusters are 
created to reflect the wide range in observed morphologies. 
While open and globular clusters usually show stellar sur- 
face density distributions with relatively smooth radial pro- 
files that can be described in good approximation by sim- 
ple power-law functions, Gaussian or King (1962) profiles, 
embedded clusters often show a hierarchical structure with 
multiple density peaks and possible fractal substructure (Lada| 
& Lada 120031 ). Furthermore, clusters can be incompletely 
sampled due to varying extinction or crowding and over- 
exposure, and therefore appear irregularly shaped. Massive 
centrally concentrated clusters (in particular globular clus- 
ters) may not be resolved into point sources in the central 
region, making them appear as rings or "doughnuts" in stel- 
lar density maps. 

The cluster sets consist of 

- centrally condensed clusters (R) with radial density pro- 
files p(r) oc r~ a , where a = 0.1, 1, and 1.5; they 
are created as described by Cartwright & Whitworth 
d2004t ; 

- fractal clusters (F) with fractal dimension D = 1.9; 
they are created following the algorithm described in 
Cartwright & Whitworth d2004| > and Goodwin & Whit- 
worth d2004i >; 

- elongated (elliptical) clusters with axis ratios of a/b = 2 
(e = 0.87; E2) and a/b = 3 (e = 0.94; E3); 

- "doughnuts" (D), created by cutting out a circular region 
with r = 0.3 around the centres of centrally condensed 
clusters (a = 1.5). These regions are empty, i.e. also 
lacking background stars. 
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Table 1 The model clusters 



Model 


density profile 


Q 


II; 


Pcl/Pbg 


RO.1.50 


radial (a = 0.1) 


0.77 


50 


1.4 


RO.1.100 


radial (a = 0.1) 


0.76 


100 


1.8 


RO. 1.200 


radial (a = 0.1) 


0.76 


200 


2.6 


RO. 1.500 


radial (a = 0.1) 


0.76 


500 


5.0 


Rl.0.50 


radial (a = 1.0) 


0.85 


50 


1.4 


Rl. 0.100 


radial (a — 1.0) 


0.85 


100 


1.8 


Rl. 0.200 


radial (a = 1.0) 


0.85 


200 


2.6 


Rl.5.50 


radial (a = 1.5) 


0.97 


50 


1.4 


Rl. 5.100 


radial (a — 1.5) 


0.97 


100 


1.8 


Rl. 5.200 


radial (a = 1.5) 


0.97 


200 


2.6 


Fl. 9.100 


fractal (D = 1.9) 


0.66 


100 


1.8 


F 1.9.200 


fractal (D = 1.9) 


0.63 


200 


2.6 


F 1.9.500 


fractal (D = 1.9) 


0.59 


500 


5.0 


E2.50 


elliptical (a/6 = 2) 


0.78 


50 


1.8 


E2.100 


elliptical (a/6 = 2) 


0.78 


100 


2.6 


E2.200 


elliptical (a/6 = 2) 


0.78 


200 


4.2 


E3.50 


elliptical (a/6 = 3) 


0.78 


50 


2.2 


E3.100 


elliptical (a/6 = 3) 


0.78 


100 


3.4 


E3.200 


elliptical (a/6 = 3) 


0.78 


200 


5.8 


D100 


doughnut 


0.75 


100 


1.8 


D200 


doughnut 


0.75 


200 


2.6 


D500 


doughnut 


0.75 


500 


5.0 



The number of cluster members lies in the range be- 
tween 50 and 200 or 500, values typical for embedded and 
open clusters^ The centre of each cluster is at (0,0) and its 
radius (or semimajor axis) is 1 . All clusters are overlaid on 
a 10 x 10 background field of 4000 randomly distributed 
stars. Each cluster/background configuration is realised 100 
times in order to obtain mean values and standard devia- 
tions. The clusters are listed in Table[T]along with their aver- 
age Q parameter (Cartwright & Whitworth 2004 2009} and 
the average overdensity of the clusters with respect to the 
background (p c \/ Pbg)- Note that for determining the cluster 
density the entire cluster area is considered, so for centrally 
concentrated clusters the central density (and therefore the 
overdensity) is obviously much higher. 

As an additional case, a series of clusters (Rl. 0.100 
and Fl. 9.200) are superimposed over a non-uniform back- 
ground with a density gradient along the y axis (4000 stars, 
Pb g (y) oc (y+5) -1 ). Three identical clusters are then placed| 
at (0,0), (-3,3) and (3,-3). 

4 Implementation of the algorithms 

The five algorithms described in Sect. [2] are applied to the 
artificial clusters in the following way: 

For the star count (SC) method, the area is divided into 
square bins of 0.5 x 0.5 (providing on average 10 sources per 
bin) separated by 0.25 (the Nyquist criterion). Clusters are 
selected as regions that have a stellar density 3<r above the 

1 Higher numbers of cluster members would, in the given configuration, 
only increase the overdensity of the cluster and therefore facilitate its iden- 
tification. Therefore, for this study, these cases can be neglected as trivial. 



mean background density (determined in the region y < — 2 
and y > 2). For the given models, this seems to be the 
best compromise between missing real clusters and detect- 
ing false ones. 

The NN method is applied by computing the 20th NN 
density of the objects in the field, clusters are considered 
as 3cr above the background level. This yields similar re- 
sults as the more sophisticated method of Ferreira ( 1201 Oi l, 
which produces cutoff values very close to the 3er value in 
all cases. 

The Voronoi tessellation (VT) is performed via the De- 
launay triangulation using the procedures provided in IDL. 
To avoid border effects, points at the edges are ignored when 
computing the mean background density. Clusters are de- 
fined as density enhancements 2cr above the background 
level. In the second step, the obtained density estimates are 
smoothed over all adjacent bins (called sVT). 

The MST of all sources is constructed using Prim's d!9571 >[ 
algorithm, and then separated at l c = I and n = 20. In 
agreement with Campana et al. (120081) . l c = £ seems to 
yield the best results, although it only works well for distinct 
clusters (see also the discussion in Sect. 15.41 ). The method 
of applying different l c values and choosing the value that 
leads to the maximum number of clusters obviously does 
not work in our case, where only one cluster is present. The 
approach of Koenig et al. ( 120081 ) produces too high values 
for l c and is therefore not applicable either. 

In all cases the cluster radius is defined as the radius 
of a circle with the same area as the cluster area A c \ (the 
effective or equivalent radius, Carpenter et al. 120001 Ferreira 
120101 ): 

r eq = \A4 c i/tt. (5) 

In addition, the NN method also provides the density ra- 
dius rd (Eq. O. The cluster area is defined as the area en- 
closed by the cluster boundary contour in the SC, NN and 
VT method and as the normalized convex hull of the cluster 
members (Hoffman & Jain l 19831 Schmeja & Klessen l2006l ) 
in the MST method. The cluster centre is defined as the cen- 
troid of the objects within the cluster area, except for the NN 
method, where the density weighted centre (Eq. |2|) is used 
instead. The number of cluster members n* is estimated as 
the number of objects lying within the cluster area. To fa- 
cilitate comparison with the true values, n* is corrected by 
the average number of background sources expected in the 
cluster area. 

5 Results 

Tables |2]to|6]list the parameters (radius, cluster centre, num- 
ber of stars, and others as indicated) of the clusters as they 
are detected by the different methods. If a cluster model is 
not listed, this means that it could not be detected by the 
method. The behaviour of the individual methods will be 
discussed below. The case of clusters in a non-uniform field 
is only discussed qualitatively in the text. 
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R1.CL100 E3_100 F1.9_200 D500 
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Fig. 3 Four exemplary clusters of model Rl .0_100 (first column), E3_100 (second column), Fl .9_200 (third column), and 
D500 (last column), their density maps from star counts (second row), 20th NN density (third row) and VT (fourth row), 
and the separated MST (last row). The black lines indicate the cluster boundaries as defined for the respective method (see 
text for details). 
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Four clusters, one of each type (R, E, F, D), are shown 
as examples in Figure [3] along with their stellar density 
maps (from the SC, VT, and NN method) and their sepa- 
rated MST. Figure|4]shows the two studied cases of clusters 
in a non-uniform field in the same arrangement. 

5.1 Star counts 

Table |2]gives the parameters of the clusters detected by the 
star count method. 

The clusters of model RO. 1 _50 and E2_50 are not found, 
their density enhancement is not larger than that of random 
fluctuations. In the case of all R0.1_100 clusters, some den- 
sity enhancement is found at the cluster position, however, 
in most cases, its shape does not resemble the true one and 
the estimated number of cluster members is much too low. 
For RO. 1 _200 and RO. 1 _500 the detections get quite reliable. 
The clusters of the R1.0 and R1.5 models are all identified 
correctly. The determined numbers of cluster members are 
impressingly close to the true values, however, the estimated 
cluster sizes (< 0.5) are much smaller. This is understand- 
able, since due to the high degree of central concentration 
the vast majority of cluster members lie within that small ra- 
dius, while the few outside are statistically indistinguishable 
from the background. Consequently, the numbers of objects 
are correctly determined. 

The elliptical clusters with n* > 100 and the dough- 
nuts are identified correctly, although the number of mem- 
ber stars tends to be underestimated in all cases. All the frac- 
tal clusters of models F1.9J200 and F1.9_500 are detected, 
however, in some cases, multiple density peaks are identi- 
fied as separate clusters, explaining the rather low numbers 
of detected members n* along with the high standard devi- 
ations. 

A density threshold of 3cr above the background turns 
out to be best suited for detecting the given clusters. While 
a threshold < 2a results in a better detection of low-density 
clusters, at the same time it produces too many fake clusters, 
which are basically indistinguishable from the real one. A 
threshold > 3er appears to be too rigid and underestimates 
the cluster sizes. The choice of this threshold is obviously 
more relevant for clusters with low overdensity, while e.g. 
changing the threshold from 2 to 3tr changes r from 0.68 
to 0.39 and n* from 70 to 32 for R0. 1.100, the effect is 
negligible for dense clusters: For R1.5_200 r changes only 
from 0.52 to 0.51, while n* = 200 remains the same. 

The situation becomes more complicated for clusters 
in a non-uniform field. As the central overdensity is sig- 
nificant, the R 1.0_100 cluster is detected in all three cases, 
along with several random density enhancements in the dens-| 
est part. The cluster F1.9_200 is detected only in the densest 
part (remind that the detected structure consists of the ac- 
tual cluster plus background). In the other two positions, a 
cluster is clearly visible in the stellar density maps, but not 
detected using the 3er threshold (which is derived from the 
average background). 



R1.0_10D F1.9_200 
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Fig. 4 Three identical clusters of model R1.0_100 (left 
column) and F1.9_200 (right column) on a background with 
a gradient, and their SC, NN, and VT density maps and the 
MST separated at l c = 1.5£, arranged in the same way as 
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Table 2 Cluster parameters from the star count method 



Model 


X 


Ux 


y 


a v 


' eq 


(J r 


n* 


0"n 


RO. 1.100 


0.004 


0.277 


-0.012 


0.266 


0.39 


0.14 


32 


20 


RO. 1.200 


-0.009 


0.070 


0.004 


0.067 


0.87 


0.08 


173 


25 


RO. 1.500 


0.002 


0.022 


0.001 


0.023 


1.09 


0.02 


503 


12 


Rl.0.50 


-0.005 


0.051 


0.003 


0.049 


0.41 


0.03 


44 


7 


Rl.0.100 


0.000 


0.027 


0.000 


0.029 


0.51 


0.02 


100 


7 


R 1.0_200 


-0.001 


0.021 


-0.001 


0.020 


0.61 


0.03 


202 


7 


Rl.5.50 


0.000 


0.017 


-0.003 


0.021 


0.40 


0.02 


51 


5 


Rl.5.100 


0.001 


0.011 


-0.001 


0.013 


0.47 


0.01 


102 


6 


Rl. 5.200 


-0.001 


0.007 


0.001 


0.007 


0.51 


0.00 


200 


5 


E2.100 


0.013 


0.106 


-0.003 


0.053 


0.61 


0.05 


84 


14 


E2.200 


0.003 


0.045 


-0.002 


0.023 


0.77 


0.02 


201 


10 


E3.50 


0.011 


0.242 


-0.006 


0.078 


0.34 


0.09 


26 


12 


E3.100 


0.016 


0.096 


-0.001 


0.029 


0.59 


0.03 


95 


10 


E3_200 


-0.003 


0.041 


-0.002 


0.014 


0.72 


0.02 


202 


7 


Fl.9.100 


0.015 


0.298 


0.008 


0.326 


0.49 


0.10 


55 


20 


F 1.9.200 


-0.028 


0.179 


0.007 


0.144 


0.80 


0.08 


180 


22 


Fl.9.500 


0.004 


0.106 


-0.011 


0.138 


1.02 


0.07 


499 


30 


D100 


-0.051 


0.345 


0.016 


0.340 


0.37 


0.10 


31 


17 


D200 


-0.003 


0.063 


0.002 


0.061 


0.85 


0.04 


165 


15 


D500 


0.003 


0.024 


0.006 


0.021 


1.06 


0.02 


493 


12 



Table 3 Cluster parameters from the NN method 



Model 


X 




y 


<Jy 


rd 




» eq 




n t 


On 


RO. 1.100 


-0.004 


0.153 


0.028 


0.156 


0.415 


0.226 


0.553 


0.286 


53 


33 


RO. 1.200 


-0.005 


0.071 


-0.001 


0.065 


0.608 


0.091 


0.976 


0.139 


188 


31 


RO. 1.500 


0.004 


0.036 


0.003 


0.037 


0.625 


0.021 


1.162 


0.020 


499 


13 


Rl.0.50 


-0.003 


0.048 


0.003 


0.045 


0.221 


0.053 


0.467 


0.048 


47 


8 


Rl.0.100 


-0.000 


0.021 


-0.000 


0.023 


0.180 


0.037 


0.565 


0.034 


100 


8 


Rl. 0.200 


-0.001 


0.011 


0.000 


0.013 


0.147 


0.028 


0.669 


0.031 


200 


8 


Rl.5.50 


0.000 


0.001 


0.000 


0.001 


0.008 


0.003 


0.356 


0.021 


49 


5 


Rl.5.100 


0.000 


0.000 


0.000 


0.000 


0.003 


0.001 


0.372 


0.020 


99 


6 


R 1.5.200 


0.000 


0.000 


0.000 


0.000 


0.001 


0.000 


0.383 


0.016 


198 


4 


E2.50 


0.006 


0.207 


0.003 


0.145 


0.284 


0.128 


0.380 


0.163 


24 


15 


E2.100 


0.008 


0.082 


-0.002 


0.053 


0.487 


0.043 


0.765 


0.046 


101 


12 


E2.200 


0.001 


0.054 


-0.002 


0.030 


0.480 


0.027 


0.878 


0.030 


201 


11 


E3.50 


-0.004 


0.165 


-0.014 


0.071 


0.392 


0.097 


0.547 


0.117 


45 


13 


E3.100 


0.016 


0.092 


-0.001 


0.031 


0.443 


0.043 


0.720 


0.040 


102 


9 


E3.200 


-0.009 


0.058 


-0.002 


0.020 


0.433 


0.031 


0.814 


0.026 


199 


9 


Fl.9.100 


0.032 


0.217 


-0.014 


0.237 


0.532 


0.089 


0.757 


0.101 


85 


20 


F 1.9.200 


-0.025 


0.152 


0.012 


0.130 


0.585 


0.080 


0.974 


0.064 


196 


15 


Fl.9.500 


0.000 


0.142 


-0.012 


0.164 


0.594 


0.088 


1.123 


0.072 


497 


26 


D100 


-0.019 


0.165 


-0.005 


0.205 


0.566 


0.081 


0.757 


0.112 


71 


20 


D200 


-0.001 


0.064 


-0.001 


0.062 


0.603 


0.028 


0.995 


0.039 


184 


13 


D500 


0.000 


0.033 


0.007 


0.029 


0.577 


0.016 


1.154 


0.027 


493 


12 



5.2 Nearest neighbour density 

The NN algorithm performs similar to the SC method for 
the centrally concentrated clusters (R) and slightly better for 
the other models (E, F, D), where the number of cluster stars 
is closer to the true value. Apart from model R0. 1 .50 (where 
often some density enhancement can be seen at the expected 
position, although with a density peak often smaller than 
that of random density enhancements), all clusters are roughly|5) and the density radius (Columns 6 and 7). As expected, 
or exactly identified. Owing to the nature of the methods, the position of the density centre is close to (0,0) in the cen- 
the NN density maps show a better resolution than density trally concentrated clusters, but can be significantly shifted 



maps from star counts. This is not very relevant for the iden- 
tification of the clusters, but a useful feature for additional 
studies of the cluster structure, such as the detection of in- 
dividual density peaks in hierarchical clusters. 

Table |3] lists the parameters of the detected clusters. In 
addition to the equivalent radius (Columns 8 and 9) and the 
number of stars (Columns 10 and 11) the NN method also 
provides the coordinates of the density centre (Columns 2 to 
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Fig. 5 The stellar density map of a cluster of type 
Fl. 9.200 derived from the VT (left) and the sVT method 
(right). 

in the fractal clusters. The density radius is very small for 
highly centrally concentrated clusters (R1.5). 

Concerning the cluster sizes and the selected density 
threshold, the same considerations as for the SC method ap- 
ply. 

Also for clusters in a non-uniform field, the NN algo- 
rithm performs similar to the SC method. The centrally con- 
centrated cluster is detected in all cases, but only the density 
peaks of the fractal clusters are found, the clusters as such 
are not clearly distinguishable from the background. 

5.3 Voronoi tessellation 

Only clusters with a relatively high density contrast are re- 
liably found by the VT method (see TableHJ). Only the clus- 
ters of model Rl .5 are exactly identified, these however with 
the exact number of objects and a very small standard devia- 
tion. Lower-density clusters (n < 200) are hardly detected. 
While there is usually some density enhancement seen at the 
position of the cluster, this often corresponds to few Voronoi 
cells only, and is in any way much smaller than the real clus- 
ter. Most of these detections are in size and density indistin- 
guishable from random density enhancements found in the 
field. The clusters with no clear density gradient (R0.1, E, 
F) are usually broken up into smaller fragments that are de- 
tected as separate clusters, explaining the low number n* 
and high a n . Owing to the partition into polygonal cells and 
its non-smoothing nature, the shapes of the detected clusters 
are often very irregular and filamentary. 

When applying the smoothing procedure, the ability to 
detect clusters increases (see Tableland Fig.[5]i, i.e. more 
cluster members are identified, and some clusters not found 
by the VT method are identified. Still, only very dense clus- 
ters (Rl .0_200, Rl .5) are reliably detected by the sVT method 
in the other cases the estimated number of cluster members 
is too low, at a relatively high error. 

Both centrally concentrated and fractal clusters in the 
non-uniform field are hardly detected by the VT method. 
While the clusters at y = 3 and y = may be roughly 
distinguishable by eye in the VT density maps, their over- 
density is hardly significant and therefore not identifiable 




Fig. 6 The MST of a cluster of type R0. 1 .200, separated 
at l c = i (left) and l c = I At (right) and N = 20. The thick 
lines show the largest identified cluster. The circle indicates 
the true cluster area (r = 1). 



by applying a certain density threshold. The clusters at y = 
—3 (in the densest part) on the other hand are completely 
merged with the background. 

5.4 Minimum spanning tree separation 

Only clusters with a relatively high density contrast (mod- 
els R1.0, R1.5) are reliably found by the MST method with 
the chosen l c . Clusters of the other models are either not 
detected at all or clearly too small, indicating that the clus- 
ter is broken up into fragments. This behaviour is similar to 
the VT method; interestingly, the average numbers of clus- 
ter members detected by the MST method are often close to 
those from the sVT method. 

The chosen cutoff length of l c = I works well for the 
pronounced clusters (Rl .0, Rl .5). For clusters with a smaller^ 
density contrast or hierarchical clusters, this underestimates 
the cluster size. Changing l c e.g. from I to \Al for the 
model R0.1_200 shifts the average number of cluster mem- 
bers from 51 to 195, close to the expected value, however, 
at the same time a lot of false clusters with N > 20 are 
detected. Depending on the value of l c , either the cluster is 
broken up into several fragments, or it is detected in its true 
size along with a lot of random density enhancements erro- 
neously identified as clusters as well (see Fig. [6j. Most of 
the false clusters seen in Fig. [6] are very elongated, so this 
might be used as an additional (but not unambiguous) cri- 
terion to distinguish true clusters from random density en- 
hancements. The clustering parameters g and M also help 
in filtering true clusters. In the example of Fig. [6] (right) 
the subtree corresponding to the real cluster indeed has the 
highest g and M values (g = 1.56, M = 579), while the 
,|other subtrees show values 1.07 < g < 1.35 and 23 < 
M < 63. M in particular seems a good criterion to dis- 
tinguish real clusters from random density enhancements, 
although the difference (and therefore the criterion where 
to draw the line) is not always that clear. Nevertheless, this 
does not help in the a priori choice of l c . 

Clusters in a non-uniform field are hard to isolate using 
the MST method, at least with a uniform l c . Using l c = 
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Table 4 Cluster parameters from the Voronoi tessellation 



Model 


X 


O x 


y 


°y 






n t 


0~ra 


R0. 1.500 


0.032 


0.139 


0.038 


0.141 


0.74 


0.13 


364 


90 


Rl.0.100 


0.023 


0.146 


-0.017 


0.164 


0.26 


0.04 


63 


12 


R1.0_200 


-0.011 


0.092 


0.014 


0.094 


0.40 


0.04 


173 


16 


Rl.5.50 


0.001 


0.050 


0.003 


0.043 


0.08 


0.02 


47 


2 


Rl.5.100 


0.001 


0.026 


-0.003 


0.032 


0.10 


0.01 


98 


2 


Rl. 5.200 


-0.002 


0.014 


0.002 


0.015 


0.11 


0.01 


198 


2 


E2.200 


0.060 


0.255 


0.021 


0.179 


0.33 


0.10 


77 


36 


E3.100 


0.029 


0.459 


-0.045 


0.393 


0.18 


0.06 


25 


13 


E3.200 


-0.025 


0.206 


-0.006 


0.161 


0.44 


0.07 


148 


32 


F 1.9.200 


-0.017 


0.361 


0.056 


0.385 


0.24 


0.07 


47 


25 


Fl.9.500 


0.028 


0.253 


-0.032 


0.288 


0.53 


0.10 


303 


92 


D500 


-0.005 


0.101 


0.005 


0.121 


0.73 


0.09 


369 


55 



Table 5 Cluster parameters from the Voronoi tessellation with smoothing 



Model 


X 


ffi 


y 


o y 




o r 


71* 


O n 


R0. 1.200 


-0.009 


0.308 


0.005 


0.335 


0.41 


0.13 


65 


36 


R0. 1.500 


0.027 


0.119 


0.024 


0.110 


0.96 


0.03 


480 


16 


Rl.0.50 


-0.026 


0.266 


-0.023 


0.269 


0.24 


0.04 


29 


8 


Rl.0.100 


0.009 


0.154 


-0.003 


0.155 


0.37 


0.04 


83 


10 


R 1.0.200 


-0.013 


0.100 


0.016 


0.103 


0.48 


0.03 


191 


9 


Rl.5.50 


0.005 


0.089 


-0.005 


0.086 


0.21 


0.03 


48 


3 


Rl.5.100 


0.006 


0.058 


-0.004 


0.062 


0.23 


0.03 


98 


3 


Rl. 5.200 


-0.003 


0.031 


0.001 


0.030 


0.24 


0.02 


198 


3 


E2.100 


-0.028 


0.389 


0.070 


0.348 


0.28 


0.09 


32 


17 


E2.200 


-0.001 


0.140 


0.005 


0.127 


0.62 


0.04 


172 


17 


E3.100 


0.018 


0.300 


-0.015 


0.255 


0.38 


0.08 


62 


21 


E3.200 


-0.012 


0.121 


0.001 


0.159 


0.58 


0.06 


188 


20 


F 1.9.200 


-0.007 


0.295 


0.006 


0.313 


0.48 


0.10 


106 


38 


Fl.9.500 


-0.017 


0.186 


-0.024 


0.198 


0.83 


0.11 


449 


68 


D200 


0.023 


0.299 


-0.018 


0.247 


0.47 


0.14 


89 


38 


D500 


-0.015 


0.111 


-0.002 


0.109 


0.92 


0.03 


458 


15 



Table 6 Cluster parameters from the MST method 



Model 


X 


O x 


y 


Oy 




Or 


n* 




R0. 1.200 


0.019 


0.345 


0.061 


0.359 


0.44 


0.10 


51 


18 


R0. 1.500 


0.002 


0.045 


0.004 


0.049 


0.99 


0.03 


479 


32 


R 1.0.50 


-0.008 


0.094 


-0.003 


0.068 


0.31 


0.08 


37 


10 


Rl.0.100 


-0.003 


0.044 


-0.002 


0.044 


0.44 


0.05 


94 


11 


R 1.0.200 


-0.002 


0.022 


0.001 


0.022 


0.52 


0.03 


205 


9 


Rl.5.50 


0.000 


0.009 


-0.001 


0.011 


0.09 


0.04 


53 


3 


Rl.5.100 


0.001 


0.007 


0.000 


0.011 


0.10 


0.05 


104 


4 


Rl. 5.200 


-0.003 


0.002 


-0.000 


0.002 


0.09 


0.02 


203 


2 


E2.100 


-0.035 


0.333 


0.004 


0.148 


0.36 


0.08 


37 


14 


E2.200 


0.020 


0.135 


0.001 


0.063 


0.67 


0.08 


170 


32 


E3.100 


0.024 


0.306 


0.008 


0.063 


0.43 


0.08 


62 


19 


E3.200 


-0.011 


0.072 


-0.001 


0.014 


0.62 


0.04 


200 


12 


Fl. 9.100 


-0.037 


0.491 


0.029 


0.560 


0.29 


0.08 


28 


11 


F 1.9.200 


-0.031 


0.363 


0.034 


0.106 


0.47 


0.11 


84 


33 


Fl.9.500 


-0.031 


0.258 


-0.050 


0.289 


0.74 


0.15 


351 


94 


D200 


-0.011 


0.272 


-0.035 


0.272 


0.54 


0.14 


76 


26 


D500 


0.000 


0.040 


0.010 


0.033 


0.97 


0.03 


452 


23 
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i only small fragments of the clusters (and random den- 
sity enhancements of the dense part of the background) are 
found, while for l c = 1.5£ clusters at y = 3 and y = par- 
tially found, along with a contiguous structure in the dense 
part. 

6 Discussion and Conclusions 

Table |7]provides a schematic overview of the performance 
of the five algorithms: An open circle indicates a rough iden- 
tification of the cluster (some density enhancement detected,| 
however with a size and/or shape significantly different from| 
the true one), a filled circle indicates that the model cluster is 
identified correctly (n* of the identified cluster has a max- 
imum deviation of about 5% from the true value) while a 
dash shows that no cluster is found by the algorithm. At 
first glance, Table [7] suggests that the NN method is the 
most reliable one, finding the cluster in all but one model, 
with exact identifications in 14 cases. On the other end, the 
VT method delivers only three exact identifications and 12 
non-detections. However, the results for this specific sample 
cannot necessarily be generalized, since the ability to detect 
clusters depends strongly on the type of cluster. 

Centrally concentrated clusters (models R1.0 and R1.5) 
are reliably detected by all algorithms, with the accuracy ob- 
viously increasing with increasing number of cluster stars 
(and therefore, overdensity). On the other hand, subclus- 
ters of fractal clusters are often identified as separate clus- 
ters, the VT and MST method are particularly prone to this. 
(However, after all, it is a question of definition, whether 
two or more density peaks are called subclusters of a larger 
cluster or individual clusters.) 

Clusters superimposed over a non-uniform background 
are most reliably detected in the SC and NN density maps, 
while they are hard, if not impossible, to distinguish from 
the background using the VT and MST methods. However, 
while these clusters may be identified by eye in the stellar 
density maps, they are not necessarily picked up by an au- 
tomated algorithm using a fixed density threshold for the 
entire area. This illustrates the importance of the choice of 
an adequate sampling window or an adaptive way of deter- 
mining the density threshold from the local environment of 
potential clusters. 

The algorithms differ strongly in their runtimes, with 
the slowest algorithm taking almost 200 times as long as 
the fastest one. In the configurations used for this study, the 
runtimes of the SC, VT, sVT, NN and MST algorithms com- 
pare to each other as 1:4:4:123:189. Even when using faster 
algorithms than the ones used for this study, this will con- 
stitute a serious difference. 

The computationally expensive NN algorithm partly com-| 
pensates for this by easily providing additional parameters 
such as the density-weighted position of the centre or the 
density radius (core radius). It is also useful, in particular 
when varying j, for the study of the internal structure of 
clusters, as it allows the identification of subclusters and the 



Table 7 Performance of the algorithms (filled circle: cor- 
rect cluster identification, open circle: rough identification, 
dash: no identification) 



Model 


SC 


NN 


VT 


sVT 


MST 


R0.1.50 


- 


- 


- 


- 


- 


R0. 1.100 


o 


o 


- 


- 


- 


R0. 1.200 


o 


o 


- 


o 


o 


R0. 1.500 


• 


• 


o 


• 


• 


Rl.0.50 


o 


• 


- 


o 


o 


Rl.0.100 


• 


• 


o 


o 


• 


Rl. 0.200 


• 


• 


o 


• 


• 


Rl.5.50 


• 


• 


• 


• 


• 


Rl.5.100 


• 


• 


• 


• 


• 


Rl. 5.200 


• 


• 


• 


• 


• 


E2.50 


- 




- 


- 


- 


E2.100 


o 


• 


- 


- 


o 


E2.200 


• 




o 


o 


o 


E3.50 


o 








o 


E3.100 


• 








o 


E3.200 


• 




o 


o 


• 


Fl.9.100 


o 








o 


F 1.9.200 


o 






o 


o 


F 1.9.500 


• 




o 


o 


o 


D100 


o 


o 








D200 


o 


o 






o 


D500 


• 


• 


o 


o 


o 



exact location of density peaks. This can in principle also be 
seen in stellar density maps from star counts, but at a much 
coarser resolution. 

The VT method is too sensitive to small fluctuations, as 
it contains no inherent smoothing, unlike the NN method 
(with j >> 1) and the SC method (by binning the data). 
It therefore is only able to detect rather distinct clusters, 
clusters with a small density contrast compared to the back- 
ground are likely to be broken up into small fragments (see 
Fig E]l or not being detected at all. While the lack of binning 
or assumptions on the shape of the structure make the VT a 
good tool to study small-scale density variations and highly 
filamentary structures, it is less suited for typical star clus- 
ters. Smoothing the density estimates over adjacent cells im- 
proves the performance of the VT method, but it still under- 
estimates the cluster sizes for all but the densest clusters. 
Given that it performs worse than the similar SC and NN 
algorithms, the application of the VT and sVT method to 
star clusters is discouraged. 

The MST method is very sensitive to the value of l c . The 
choice of l c is crucial, much more than the choice of pthresh 
in the SC or NN method. Like the NN method with small j 
or the VT, it is too sensitive to small-scale (random) density 
fluctuations. A wrong choice of l c easily leads to the de- 
tection of numerous fake clusters or the break-up of single 
clusters into several ones. Unfortunately, there seems to be 
no generally applicable rule for finding an adequate l c value. 
A value around I seems to be good for the discussed models 
(one cluster in a much larger field of randomly distributed 
sources), but may not be applicable to other cases. The MST 
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method is, however, a good method to 'play around' with 
on certain areas, e.g. to study different clustering scales in 
galaxies by varying the value for l c , as it has been demon- 
strated for M33 and the Large Magellanic Cloud by Bastian 
et al. (I2007II2009T |. As the MST is a one-dimensional struc- 
ture in a space of two or more dimensions it may lead to the 
incomplete detection of clusters elongated along the local 
tree direction. While this and the lack of inherent smooth- 
ing makes the MST algorithm less feasible for typical star 
clusters, it is more successful at identifying highly filamen- 
tary structures (e.g. in the distribution of galaxies: Bhavsar 
& Ling |1988a| Pearson & Coles [l995l 

As all algorithms have their specific strengths and weak- 
nesses, the choice of the method should depend on the size 
and character of the data set and the purpose of the study. 
For large-scale investigations (e.g. on all-sky or wide-field 
surveys) the computing time plays a considerable role, mak- 
ing the NN and MST methods less feasible. The SC and 
MST methods require an a priori choice of parameters (bin 
size and l c , respectively), which may be difficult in particu- 
lar for the analysis of large data sets or regions with highly 
varying stellar density. Nevertheless, for large fields, a star 
count algorithm with refinements or additional investiga- 
tions of the cluster candidates is probably the best choice. 
On smaller scales, in particular for embedded clusters in a 
molecular cloud, the NN method makes sense, since it is 
more capable than the other methods of detecting clusters 
without a clear radial density gradient or hierarchical clus- 
ters, as it is often the case for young clusters. It is recom- 
mended in particular when additional cluster parameters or 
information on the internal structure are desired. 

In any case it should be kept in mind that all discussed 
algorithms only detect stellar density enhancements and do 
not provide information whether the identified objects are 
physically related clusters. Additional tests, such as an ex- 
pectation-maximization algorithm fitting Gaussian profiles 
to potential clusters (Mercer et al. 120051 Froebrich et al. 
120101 1. colour-magnitude diagrams or kinematical informa- 
tion, can be used to constrain the results, at least for evolved 
open clusters. For embedded clusters, which are usually sur- 
rounded by a halo of similar YSOs and often do not show 
a smooth density profile, these criteria may not applica- 
ble, and the identification of embedded clusters will remain 
somewhat arbitrary and strongly depend on the definition. 
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